Search CORE

187 research outputs found

Generalization in Reinforcement Learning by Soft Data Augmentation

Author: Hansen Nicklas
Wang Xiaolong
Publication venue
Publication date: 08/04/2021
Field of study

Extensive efforts have been made to improve the generalization ability of Reinforcement Learning (RL) methods via domain randomization and data augmentation. However, as more factors of variation are introduced during training, optimization becomes increasingly challenging, and empirically may result in lower sample efficiency and unstable training. Instead of learning policies directly from augmented data, we propose SOft Data Augmentation (SODA), a method that decouples augmentation from policy learning. Specifically, SODA imposes a soft constraint on the encoder that aims to maximize the mutual information between latent representations of augmented and non-augmented data, while the RL optimization process uses strictly non-augmented data. Empirical evaluations are performed on diverse tasks from DeepMind Control suite as well as a robotic manipulation task, and we find SODA to significantly advance sample efficiency, generalization, and stability in training over state-of-the-art vision-based RL methods.Comment: Website: https://nicklashansen.github.io/SODA/ Code: https://github.com/nicklashansen/dmcontrol-generalization-benchmark. Presented at International Conference on Robotics and Automation (ICRA) 202

arXiv.org e-Print Archive

eScholarship - University of California

TD-MPC2: Scalable, Robust World Models for Continuous Control

Author: Hansen Nicklas
Su Hao
Wang Xiaolong
Publication venue
Publication date: 25/10/2023
Field of study

TD-MPC is a model-based reinforcement learning (RL) algorithm that performs local trajectory optimization in the latent space of a learned implicit (decoder-free) world model. In this work, we present TD-MPC2: a series of improvements upon the TD-MPC algorithm. We demonstrate that TD-MPC2 improves significantly over baselines across 104 online RL tasks spanning 4 diverse task domains, achieving consistently strong results with a single set of hyperparameters. We further show that agent capabilities increase with model and data size, and successfully train a single 317M parameter agent to perform 80 tasks across multiple task domains, embodiments, and action spaces. We conclude with an account of lessons, opportunities, and risks associated with large TD-MPC2 agents. Explore videos, models, data, code, and more at https://nicklashansen.github.io/td-mpc2Comment: Explore videos, models, data, code, and more at https://nicklashansen.github.io/td-mpc

arXiv.org e-Print Archive

MoDem-V2: Visuo-Motor World Models for Real-World Robot Manipulation

Author: Hansen Nicklas
Kumar Vikash
Lancaster Patrick
Rajeswaran Aravind
Publication venue
Publication date: 25/09/2023
Field of study

Robotic systems that aspire to operate in uninstrumented real-world environments must perceive the world directly via onboard sensing. Vision-based learning systems aim to eliminate the need for environment instrumentation by building an implicit understanding of the world based on raw pixels, but navigating the contact-rich high-dimensional search space from solely sparse visual reward signals significantly exacerbates the challenge of exploration. The applicability of such systems is thus typically restricted to simulated or heavily engineered environments since agent exploration in the real-world without the guidance of explicit state estimation and dense rewards can lead to unsafe behavior and safety faults that are catastrophic. In this study, we isolate the root causes behind these limitations to develop a system, called MoDem-V2, capable of learning contact-rich manipulation directly in the uninstrumented real world. Building on the latest algorithmic advancements in model-based reinforcement learning (MBRL), demo-bootstrapping, and effective exploration, MoDem-V2 can acquire contact-rich dexterous manipulation skills directly in the real world. We identify key ingredients for leveraging demonstrations in model learning while respecting real-world safety considerations -- exploration centering, agency handover, and actor-critic ensembles. We empirically demonstrate the contribution of these ingredients in four complex visuo-motor manipulation problems in both simulation and the real world. To the best of our knowledge, our work presents the first successful system for demonstration-augmented visual MBRL trained directly in the real world. Visit https://sites.google.com/view/modem-v2 for videos and more details.Comment: 9 pages, 8 figure

arXiv.org e-Print Archive

Finetuning Offline World Models in the Real World

Author: Feng Yunhai
Hansen Nicklas
Rajagopalan Chandramouli
Wang Xiaolong
Xiong Ziyan
Publication venue
Publication date: 24/10/2023
Field of study

Reinforcement Learning (RL) is notoriously data-inefficient, which makes training on a real robot difficult. While model-based RL algorithms (world models) improve data-efficiency to some extent, they still require hours or days of interaction to learn skills. Recently, offline RL has been proposed as a framework for training RL policies on pre-existing datasets without any online interaction. However, constraining an algorithm to a fixed dataset induces a state-action distribution shift between training and inference, and limits its applicability to new tasks. In this work, we seek to get the best of both worlds: we consider the problem of pretraining a world model with offline data collected on a real robot, and then finetuning the model on online data collected by planning with the learned model. To mitigate extrapolation errors during online interaction, we propose to regularize the planner at test-time by balancing estimated returns and (epistemic) model uncertainty. We evaluate our method on a variety of visuo-motor control tasks in simulation and on a real robot, and find that our method enables few-shot finetuning to seen and unseen tasks even when offline data is limited. Videos, code, and data are available at https://yunhaifeng.com/FOWM .Comment: CoRL 2023 Oral; Project website: https://yunhaifeng.com/FOW

arXiv.org e-Print Archive

Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers

Author: Hansen Nicklas
Wang Xiaolong
Xu Huazhe
Yang Ruihan
Zhang Minghao
Publication venue
Publication date: 08/07/2021
Field of study

We propose to address quadrupedal locomotion tasks using Reinforcement Learning (RL) with a Transformer-based model that learns to combine proprioceptive information and high-dimensional depth sensor inputs. While learning-based locomotion has made great advances using RL, most methods still rely on domain randomization for training blind agents that generalize to challenging terrains. Our key insight is that proprioceptive states only offer contact measurements for immediate reaction, whereas an agent equipped with visual sensory observations can learn to proactively maneuver environments with obstacles and uneven terrain by anticipating changes in the environment many steps ahead. In this paper, we introduce LocoTransformer, an end-to-end RL method for quadrupedal locomotion that leverages a Transformer-based model for fusing proprioceptive states and visual observations. We evaluate our method in challenging simulated environments with different obstacles and uneven terrain. We show that our method obtains significant improvements over policies with only proprioceptive state inputs, and that Transformer-based models further improve generalization across environments. Our project page with videos is at https://RchalYang.github.io/LocoTransformer .Comment: Our project page with videos is at https://RchalYang.github.io/LocoTransforme

arXiv.org e-Print Archive

Inventory Centralization Decision Framework for Spare Parts

Author: Gregersen Nicklas
Herbert-Hansen Zaza Nadja Lee
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

Online Research Database In Technology

Combining or Separating Forward and Reverse Logistics

Author: Ghosh Amartya
Gregersen Nicklas Gregers
Groth Anders
Herbert-Hansen Zaza Nadja Lee
Larsen Samuel
Nielsen Anders
Publication venue: 'Emerald'
Publication date: 01/01/2018
Field of study

VBN

Online Research Database In Technology

Self-supervised policy adaptation during deployment

Author: Abbeel Pieter
Alenyà Ribas Guillem
Efros A Alexei
Hansen Nicklas
Jangir Rishabh
Pinto Lerrel
Wang Xiaolong
Publication venue: OpenReview.net
Publication date: 01/01/2021
Field of study

In most real world scenarios, a policy trained by reinforcement learning in one environment needs to be deployed in another, potentially quite different environment. However, generalization across different environments is known to be hard. A natural solution would be to keep training after deployment in the new environment, but this cannot be done if the new environment offers no reward signal. Our work explores the use of self-supervision to allow the policy to continue training after deployment without using any rewards. While previous methods explicitly anticipate changes in the new environment, we assume no prior knowledge of those changes yet still obtain significant improvements. Empirical evaluations are performed on diverse simulation environments from DeepMind Control suite and ViZDoom, as well as real robotic manipulation tasks in continuously changing environments, taking observations from an uncalibrated camera. Our method improves generalization in 31 out of 36 environments across various tasks and outperforms domain randomization on a majority of environments. Webpage and implementation: https://nicklashansen.github.io/PAD/.Peer ReviewedPostprint (published version

UPCommons. Portal del coneixement obert de la UPC

Electrical energy by electrode placement for cardioversion of atrial fibrillation: a systematic review and meta-analysis

Author: Frost Lars
Holst-Hansen Mikkel Zacharias Bystrup
Johnsen Søren Paaske
Lip Gregory Y. H.
Trinquart Ludovic
Vinter Nicklas
Publication venue
Publication date: 01/11/2023
Field of study

OBJECTIVE: Electrode patch position may not be critical for success when cardioverting atrial fibrillation (AF), but the relevance of applied electrical energy is unclarified. Our objective was to perform a meta-analysis of randomised trials to examine the dose-response relation between energy level and cardioversion success by electrode position in elective cardioversion.METHODS: We searched PubMed, Embase, The Cochrane Library, Google Scholar and Scopus Citations. Inclusion criteria were randomised controlled trials using biphasic shock waves and self-adhesive patches, and publication date from 2000 to 2023. We used random-effects dose-response models to meta-analyse the relation between energy level and cardioversion success by anterolateral and anteroposterior position. Random-effects models estimated pooled risk ratios (RR) for cardioversion success after the first and the final shocks between the two electrode positions.RESULTS: We included five randomised controlled trials (N=1078). After the first low-energy shock, the electrode position was not significantly associated with the likelihood of successful cardioversion (pooled RR anterolateral vs anteroposterior placement 1.28, 95% CI 0.93 to 1.76, with considerable heterogeneity). After a high-energy final shock, there was no evidence of an association between the electrode position and the cumulative chance of cardioversion success (pooled RR anterolateral vs anteroposterior 1.05, 95% CI 0.97 to 1.14). Regardless of electrode position, cardioversion success was significantly less likely with shock energy levels < 200J compared with 200J.CONCLUSION: Evidence from contemporary randomised trials suggests that higher level of electrical energy is associated with higher conversion rate when cardioverting AF with a biphasic shockwave. Positioning of electrodes can be based on convenience.</p

University of Liverpool Repository

Directory of Open Access Journals

VBN

GNFactor: Multi-Task Real Robot Learning with Generalizable Neural Feature Fields

Author: Ge Yuying
Hansen Nicklas
Li Li Erran
Macaluso Annabella
Wang Xiaolong
Wu Yueh-Hua
Yan Ge
Ye Jianglong
Ze Yanjie
Publication venue
Publication date: 01/09/2023
Field of study

It is a long-standing problem in robotics to develop agents capable of executing diverse manipulation tasks from visual observations in unstructured real-world environments. To achieve this goal, the robot needs to have a comprehensive understanding of the 3D structure and semantics of the scene. In this work, we present

\textbf{GNFactor}

, a visual behavior cloning agent for multi-task robotic manipulation with

\textbf{G}

eneralizable

\textbf{N}

eural feature

\textbf{F}

ields. GNFactor jointly optimizes a generalizable neural field (GNF) as a reconstruction module and a Perceiver Transformer as a decision-making module, leveraging a shared deep 3D voxel representation. To incorporate semantics in 3D, the reconstruction module utilizes a vision-language foundation model (

\textit{e.g.}

, Stable Diffusion) to distill rich semantic information into the deep 3D voxel. We evaluate GNFactor on 3 real robot tasks and perform detailed ablations on 10 RLBench tasks with a limited number of demonstrations. We observe a substantial improvement of GNFactor over current state-of-the-art methods in seen and unseen tasks, demonstrating the strong generalization ability of GNFactor. Our project website is https://yanjieze.com/GNFactor/ .Comment: CoRL 2023 Oral. Website: https://yanjieze.com/GNFactor

arXiv.org e-Print Archive